perm filename IIA3.PUB[NSF,MUS] blob sn#096502 filedate 1974-04-10 generic text, type C, neo UTF8
COMMENT āŠ—   VALID 00007 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	.SELECT A
C00010 00003	.GROUP SKIP 2
C00016 00004	.SELECT 5
C00023 00005	.NEXT PAGE
C00033 00006	.NEXT PAGE
C00040 00007	.SELECT 5
C00045 ENDMK
CāŠ—;
.SELECT A
3. TOWARDS A GENERAL MODEL FOR SIMULATION
.SELECT C
.GROUP SKIP 1
CURRENT RESEARCH
.SELECT 1
.BEGIN FILL ADJUST
We will  here discuss  an important  aspect of  our current  research
which  is directed  toward  our ultimate  aim: the  development  of a
general model for the computer simulation of natural tones.   The two
approaches  to  simulation  described  above,    one  using  additive
synthesis   based  upon  analysis  and   the  other  using  frequency
modulation  synthesis,   have  not  proceeded  independently  of  one
another,   but  interactions  have occurred  at several  levels.   An
example is provided by  the initial use  of frequency modulation  for
the simulation of a  brass tone which was strongly  influenced by the
research  of Risset (1966)  on brass tones,  using additive synthesis
based on analysis.

The eventual development of a general model for simulation will be an
outgrowth  of  the  interdependency  and  convergence  of  these  two
approaches.  Examples  are given where  findings using one  technique
are applied and tested with the other, providing a cross-verification
of  research discoveries. Moreover, the particular advantages of each
technique  influences  the direction  of  research  using  the  other
technique, and,  in this way, both  approaches are in  the process of
converging on  a  single, more  powerful  and general  model.    This
resultant  model for  simulation  will have  the  advantages of  both
methods: the simplicity and perceptual meaningfulness of user-control
over the FM technique, and the wide range of complex cases of natural
tones handled by the additive technique.
.GROUP SKIP 2
%5interactions between additive and FM syntheses%1

One example of the  interactions which have occurred between  the two
approaches discussed above, one using additive synthesis based on the
analysis of  real  tones and  the  other using  frequency  modulation
synthesis,  is  the  initial  discovery  of the  potency  of  the  FM
technique   to  synthesize  periodic   music  instrument  tones.  The
translation of the  distinctive cues for  brass instruments found  by
Risset (1966) - which he derived from analysis-based and data-reduced
additive synthesis techniques largely analogous to those which we are
using - into  the parameters for  FM synthesis resulted  in amazingly
successful simulations of  brass tones.  This provided a confirmation
of the nature of the perceptual features seen for the brass family of
instruments.  It also  indicated the  power  of the  FM technique  to
simulate important perceptual attributes of tone. 

Direct interactions between  techniques have ensued in  our research.
Salient  features for several instruments  discovered in our approach
using additive synthesis have been translated into the  FM technique,
providing a  confirmation of  the importance  of the  suggested cues.
Modulations  which occur in certain brass instruments, especially the
French horn, have  been found  to be critical  using both methods  of
synthesis.    The successful  simulation  of  the  violin tone  using
additive synthesis was found to  necessitate the preservation of  the
inharmonicity among the partials  in the attack.  The  application of
this finding to FM synthesis was noted above, and puts a new level of
timbral complexity  within  the reach  of  the  latter method.    The
success found in  using any sort of inharmonicity  for the simulation
of  the  quality  of the  violin  attack gives  us  insight  into the
critical  feature  of  that  attack  and  the  range  of  alternative
techniques which  can be used  to generate it  - all of  which do not
duplicate the  actual acoustical  waveform of  the real  tone!   Many
other applications of findings  from the additive approach to  FM are
in progress, which also include the reed family of instruments.

We  find  special   significance  in  the  convergence  of   the  two
approaches, as it  relates to the development of a more general model
for simulation.  The  simplicity  and  perceptual  meaningfulness  of
specifications to  the frequency  modulation technique points  out an
important goal for the additive synthesis method.  On the other hand,
the complexities of tone which  are revealed by analysis,   and which
are confirmed  to be perceptually salient in  the additive synthesis,
point out necessary levels of complexity which must be accomodated by
the frequency modulation  technique. As the latter technique  is then
made  more  complex,   it in  fact  enters the  category  of additive
synthesis.   The ultimate  model for  simulation will  draw from  the
research findings using both methods.  
.END
.GROUP SKIP 2
.SELECT C
PROPOSED RESEARCH
.GROUP SKIP 1
.SELECT 1
.BEGIN FILL ADJUST
We  will  here  discuss the  proposed  research  which  is  centrally
concerned with  approaches to our ultimate aim:  the development of a
powerful, general,  and  easily-controlled algorithm  for  simulation
which is based on a comprehensive perceptual model for natural tones.
We  first will mention our  intention to explore  the possible use of
subtractive synthesis in  the simulation of natural  music instrument
tones.    We anticipate  that  this method  will  be  useful for  the
simulation of percussion instrument tones such as drum and cymbal. In
these  cases,  one  advantage  of  the  subtractive  method  is  that
inharmonic partials or  even wide-band noise may be easily introduced
into  the sound.  When  simulating  instruments  with  certain  fixed
resonances,  one  or  more  filters  could  be  positioned  at  these
resonances regardless of the fundamental pitch period of the exciting
waveform.   Other applications  will be explored,  and a  significant
third  approach  may  result, which  would  have  further  advantages
besides  those of  the above two,  additive and  frequency modulation
syntheses, and which  would be integrated  into our approach  towards
a general model for simulation.

We will next describe experimental procedures used to investigate the
perceptual  processing of instrument  tones, since, to  assist in the
development of a general simulation algorithm,  we  must formulate  a
general  model for  the  perception of  timbre.   This  will  provide
important  information  for  the  construction of  perceptually-based
higher-order simulation algorithms.   We employ  a spatial model  for
the  subjective structure  of  the  perceptual relationships  between
signals.   In particular, multidimensional scaling techniques will be
discussed.  Research is directed at uncovering  the dimensionality of
the  subjective  space, the  psychophysical  relationships which  are
structurally correlated to  this space,   and the  properties of  the
space.  The  existence of such constraints as  categorical boundaries
will  be investigated in an  attempt to assess the  continuity of the
subjective space for timbre.   Of interest is the possible  existence
of a categorical mode of perception  for musical sounds,  as has been
claimed  for speech (Liberman, et.  al., 1967).   In the same regard,
we will also  examine the effects of  musical training or context  on
the  structure of  the space.   The  model will  be evaluated  by our
ability to predict  the mappings of  real and novel  tones.  For  the
purpose of  investigating the  properties of  a subjective space  for
timbre,  and  for testing  the  existence of  a  categorical  mode of
perception, we are designing algorithms which produce new tones whose
physical properties lie between two known music instrument tones.  An
example of  one algorithm, designed for additive synthesis based upon
the data-reduced analysis of real  tones, is shown in Figure 13, and
can be heard in Recorded Example 3.

Based on these findings,  we plan the design of an algorithm which maps
the dynamic  spectra of real tones into  the FM parameters and time
functions.  The ability to write such an algorithm would indicate our
success  at having  identified the  perceptual dimensions  of timbre.
This is an important step in the convergence of synthesis techniques
toward our ultimate goal stated above.
.END
.SELECT 5
.GROUP SKIP 2
exploration of subtractive synthesis techniques
.SELECT 1
.BEGIN FILL ADJUST
A form of synthesis  which we have not yet  discussed, which actually
constitutes  the  only  class of  sound  synthesis  uncovered by  our
research, is that of subtractive synthesis.  The procedure here is to
take a simple signal with a wide bandwidth,  such as a pulse train or
a  band-limited sawtooth wave, and apply  spectral shaping filters to
produce the desired partial tone amplitudes. We have not  as yet used
this form of synthesis, but  intend to do so in the near future. This
is the type of synthesis most commonly used in vocoders.  We may thus
assimilate the techniques  of analysis and synthesis of  human speech
and  apply them  in a more  general context.  Two of the  most useful
methods seem to be the linear predictor (Atal & Schroeder, 1970; Atal
& Hanauer, 1971; Markel, 1972) and the homomorphic vocoder (Oppenheim
& Schafer, 1968; Oppenheim, 1969; Miller, 1973).

Generally,  the  technique would be  as  follows.   A  musical
instrument  tone would be analysed  at discrete  intervals,  for instance,
every 5 milliseconds.  At each  analysis point, we  compute a  filter
whose frequency response approximates the spectral shape of the input
waveform  in the interval around the  analysis point. To resynthesize
the signal, we filter  a pulse train, updating the  filter parameters
at  each analysis point.  In the  case of  the linear  predictor, the
filter is an all-pole filter. For the homomorphic vocoder, the filter
is an all-zero filter. Since these methods are well documented in the
literature, we shall not explain them here.

We anticipate that  the linear predictor will be useful for analyzing
percussion instrument tones such as drum and cymbal. In  these cases,
the excitation might be modeled as band-limited noise, rather than an
impulse  train,  the spectral  shaping  being applied  by  the filter
produced by the linear prediction algorithm. Although  the same thing
could be  done with the homomorphic vocoder,  a difficult convolution
is then required.  

The hope is  that by using  time-varying filters with both  poles and
zeros, a lower-order  filter may be used. At present, the only way of
automatically producing  the parameters  for such  a filter  directly
from digitized sound is by time-consuming optimization techniques. We
propose to determine the parameters initially in much the same way as
one  determines  the modulation  indices  for  FM  instruments.  This
involves  studying  the results  of  heterodyne  filter analysis,  the
manual preparation of parameter trajectories, and the testing of  the
results by listening to and analyzing the waveform synthesized by the
prepared   parameters.  We  hope  to   determine  whether  economical
subtractive  synthesis  can  be  realized  for  a  wide   variety  of
instruments,  and  whether  an  efficient  automatic  method  can  be
determined for the calculation of time-varying filter parameters.

One  advantage of the subtractive method  is that inharmonic partials
or even wide-band  noise may be introduced  into the sound by  adding
such excitation to the driving pulse train. The total excitation will
then be  passed  through the  filter  and will  experience  the  same
spectral shaping that a pure pulse train would experience.

When simulating  instruments with  certain fixed  resonances, one  or
more  filters could be  positioned at these  resonances regardless of
the fundamental  pitch period of  the exciting  waveform. This  means
that the  size of the multiple  tables of parameters as  functions of
the fundamental pitch period might be greatly reduced.

In the event that analysis-based subtractive synthesis proves to be a
tool for the extension of the set of tones which  we can simulate, it
of  course will  be included  in the  research program.    The future
research which  follows would  in that  case incorporate  the use  of
subtractive synthesis  in addition to  the additive and  FM synthesis
techniques that we have found useful to date.
.END
.NEXT PAGE
.SELECT 5
applications of multidimensional scaling to timbre perception
.SELECT 1
.BEGIN FILL ADJUST
A  spatial   model  will   be  employed   to  represent  the   judged
relationships  between  music  instrument tones  for  the  purpose of
uncovering the perceptual dimensions of timbre.  If the reader is not
familiar with the  computer-based multidimensional scaling techniques
discussed  below, it would be  most instructive at  this time to read
Appendix B, which  introduces the basic concepts  of multidimensional
scaling and discusses the specific algorithms we will use.

Multidimensional  scaling  is  initially  useful  for  exploring  the
psychophysical  relationships involved  in the perception  of timbre,
that is,  the  relationships between  the  subjective,  psychological
qualities of  tones and their physical  properties. Interpretation of
the  perceptual configuration in  terms of physical  attributes or an
actual  correlation   of  the  subjective   dimensions  to   physical
dimensions in  the signals is  most desirable. Various  attempts have
been made in  the recent  past.  Plomp  (1970) and  Pols (1970)  have
employed  the  MDSCAL  algorithm to  investigate  the  perception  of
steady-state  auditory  signals  generated  from  single  periods  of
musical tones and vowels.   Although several problems exist with  the
specific  assumptions of  their  approach,   the most  unsatisfactory
aspect  is the  restrictive definition  of timbre which  excludes the
temporal qualities  of sound.   Wessel  (1973b) has  used MDSCAL  and
INDSCAL in a study of  perceived and imagined relationships for a set
of 9 music instrument tones.  Two-dimensional spatial representations
were  interpretable with  respect to  the  spectral distribution  and
temporal  relations  in the  onsets  of components  of  the  tones as
analyzed by speech spectography.  The results of Wessel's preliminary
study were consistent with a pilot experiment which we conducted with
14  music instrument tones last year.   Both indicate great potential
for multidimensional scaling and the fruitfulness  of its application
to a larger range of timbres.

We feel that there is special potential  in the perceptual scaling of
the  computer-simulated music instrument tones  described above.  The
ability  to  independently  control  the  pitches,  loudnesses,   and
durations  of  tones  that  are  synthesized  makes  it  possible  to
experimentally account  for these dimensions in the stimuli.  This is
a non-trivial problem for experiments on timbre perception!

In the  case of additive  synthesis, tones which  are indiscriminable
from  the original  recordings and  which  are synthesized  from very
reduced data structures give the investigator a powerful advantage in
the interpretation of psychophysical relationships.  Not only are the
physical  parameters of the signals completely  known,  but they have
been simplified to the extent  that they are more easily  dealt with.
Nonessential    physical    characteristics    have   been    removed
independently,  reducing the  number of possible physical  attributes
which the  investigator  must consider  in an  interpretation of  the
perceptual  space.  The preliminary  success of filtering experiments
for the localization  of perceptual  cues for source  identification,
mentioned earlier  for additive synthesis,  suggests the value  of an
extended study  using a wide set  of signals.  We  plan to study both
confusions in  identification and  perceived similarity  for sets  of
instrument   tones  presented   in   various  conditions,   including
filtering.

Likewise, tones generated by frequency modulation synthesis provide a
means  of controlling significant physical  cues with a  small set of
parametric specifications.   A  rich timbre  space  results from  the
manipulation  of  a  few  basic controls  which  affect  the  dynamic
evolution,  bandwidth,   and frequency  ratios of  the partials  of a
synthesized complex  tone which  has many of  the characteristics  of
natural tones. Much insight may be gained from the perceptual scaling
of such a  simply-specified but multidimensionally  rich space.   The
extension of FM into wider ranges of  the timbre space, including the
non-periodic music  instrument tones, provides obvious advantages for
scaling.

Various approaches  will be taken  to examine specific  properties of
the  subjective space for  timbre.   Predictions for mappings  of new
tones will  be made in  order to  confirm our  interpretation of  the
physical correlations to  the perceptual space.   We are particularly
interested in the perception of tones which the listener might not be
already familiar with,  such as real ethnic  instruments or syntheses
of tones  which have no analog  in the real world.   The influence of
familiarity with instruments and  musical training on the  perception
of timbre will be explored.  

Another interest is the continuity of the space, which centers on the
nature  of the regions  which lie in  between a mapped  set of tones.
Especially important  is  the consideration  of  the effects  of  the
categorical  identification  of  instruments  on  the  perception  of
simulated  tones  which might  physically  lie in  between  the known
tones.   Equal steps  along acoustical  dimensions might  not map  as
equal steps  along respective perceptual dimensions,   because of the
influence on perception of the  tendency to categorize input  signals
with  respect  to  their  sources  of  origin.    We  describe  below
independent  perceptual testing  for the  existence of  a categorical
mode of perception for timbre.  Multidimensional scaling will also be
used to explore the space in between the known points.
.END
.NEXT PAGE
.SELECT 5
investigation of categorical perception
.SELECT 1
.BEGIN  FILL ADJUST  
As  an independent  test  for  the  possible existence  of  cognitive
constraints  on the perception  of simulated  music instrument tones,
we will employ procedures to  examine the existence of a  categorical
mode of perception.   Researchers at Haskins  Labs report categorical
effects  for certain speech sounds.   Equal steps along an acoustical
dimension,       interpolated   between    the   modelled    physical
characteristics of  two or more known speech  sounds, are employed as
stimuli.  Listeners  identify the  stimuli as  falling into  definite
categories having narrow overlap.  In addition,  the discriminability
for  pairs of stimuli, all  of which are equally  separated along the
physical dimension,  is affected by  their position  with respect  to
these categorical boundaries.  If both  members of a pair fall within
a  single category,  they are discriminated  more poorly than if they
fall within different categories.  The Haskins group has used this as
evidence for  a very special mode  of perception for  speech, and the
failure of other  researchers to find  a similar interaction  between
categorical identification and discrimination of stimuli is presented
in  support of their theory  (see Liberman,  et.  al, 1967; Liberman,
1972).

We are interested in testing for the existence of categorical effects
in the  perception of another  set of stimuli  which are in  the same
sensory domain and obtain comparable complexity as the speech sounds.
Primarily  we  are   concerned  with  perceptual  constraints   on  a
simulation  algorithm,  but  obvious fallout  will occur  for general
theories of perception.   Our specific procedure  will be similar  to
the Haskins  approach. Identification functions  will be  compared to
discrimination  functions for  a set of  stimuli consisting  of tones
physically interpolated  between  two  known  end-points,  through  a
multidimensional physical  space.   Discrimination functions  will be
derived by means of the `same-different' task discussed above.

We  have been  able to  produce interpolations  with both  methods of
synthesis.      Additive   synthesis   with   data-reduced   physical
representations  of  tones  has  given  us  a  potential  method  for
interpolating between  known  sounds,   and  the  results  have  been
strikingly successful.  The algorithm has been  based on the physical
propeties  of  tones.   It  consists  of  interpolations between  the
two-dimensional locations (time vs. amplitude or frequency) of joints
in the  three line-segment representations of  the parallel functions
for  two known tones.  An example  of a set of interpolations between
the violin tone, discussed above,  and an alto saxophone tone  at the
same  pitch and  duration  is shown  in  Figure 13.   A  particularly
significant finding  thus far  is that  the set  of tones  which  are
produced in  this manner are  identified as being  either one  or the
other  of  the  two  known  instruments,  and  that  the  categorical
boundaries have been  rather sharp.   Frequency modulation  synthesis
provides another approach for  interpolating between sounds. It gives
us  a set of parameters which can  be systematically altered from one
value  to  another.   Two  shapes  of  control  functions,  used  for
amplitude or index,  can  be interpolated between in various ways, as
in the  manner used  for  interpolating between  functions  explained
above for  additive  synthesis. Successful  interpolations have  also
been  produced with this approach,   including interpolations between
periodic and nonperiodic sounds.  

We must emphasize that we are  only in the early stages of  exploring
the  interpolation  between  sounds.    We  are  presently  employing
algorithms  based on the  physical structures of tones,  and will map
these into  a perceptual space,  both using multidimensional  scaling
and  the   identification-  discrimination  procedure   to  test  for
categorical perception.  It is clear to us from our investigation  to
date that we have uncovered very useful  tools for the examination of
properties  of the  perceptual  space for  timbre with  respect  to a
categorical mode of perception.
.END
.GROUP SKIP 2
.SELECT 5
automatic FM mappings from analyzed tones and the convergence of approaches
.SELECT 1
.BEGIN FILL ADJUST
We see the automatic generation of parametric and functional data for
FM simulations  of instrument tones to be  a significant step towards
the development  of a  general model  for  simulation. The  potential
increase in  facility and  knowledge from higher-order  algorithms is
great.  The difference  in the  two synthesis techniques  provides us
with  a  powerful means  for  evaluating  hypotheses  concerning  the
synthesis of  timbres because of  the difference of  spectral control
between the two: the additive technique allows independent control of
the  amplitude of  each  of  the components  in  time, where  the  FM
technique allows control  of only the bandwidth.  Based on the models
for timbre perception which we are able to formulate with the  use of
multidimensional scaling,   we propose to  construct algorithms which
map  into FM data, first, the reduced  data of the additive synthesis
technique and second,  the unreduced data from the original analysis.
The  difference, if  any,  between the  analytical parts  of  the two
mapping algorithms for the additive and FM techniques will be  forced
to converge where possible. 

Extending the  notion  of mapping  algorithms, and  in  the light  of
interactions  which continue to  occur between the  two approaches to
synthesis, we  see  the formulation  of  a most  powerful  simulation
algorithm which combines the advantages of both approaches.  It would
enable  the user  to realistically simulate  any known  sound via the
most perceptually  relevant parametric specification.   The power  of
user-control  over the  FM  technique, in  that the  small  number of
parameters  and  time-functions   are  of   such  strong   perceptual
importance, provides  a  model for  the optimal  level of  parametric
specification.  The  degrees of  freedom for  the  additive synthesis
technique, which allows the synthesis of any  arbitrary configuration
of  functions for  the amplitudes  and  phases of  the components  of
tones,  provides the most  powerful means of simulating  any point in
the physical timbre space.  The course of future  research is towards
the convergence of the  most powerful attributes of both methods, and
the eventual formulation of an  algorithm for simulation which  truly
satisfies the  criteria we outlined  in the introduction  of section
II:  1)  the  optimal  use  of  computer  resources, i.e.    storage,
efficiency, 2) the  perceptual validity  of the results  in terms  of
naturalness,  3) the  general applicability  to the  widest range  of
cases  found in the repertoire of  instrumental timbres, 4) the level
of user-control of the algorithm such  that parametric specifications
are perceptually meaningful, 5) the efficiency with which hypotheses
may be verified.
.END
.NEXT PAGE